Goto

Collaborating Authors

 network-valued data


On clustering network-valued data

Neural Information Processing Systems

Community detection, which focuses on clustering nodes or detecting communities in (mostly) a single network, is a problem of considerable practical interest and has received a great deal of attention in the research community. While being able to cluster within a network is important, there are emerging needs to be able to \emph{cluster multiple networks}. This is largely motivated by the routine collection of network data that are generated from potentially different populations. These networks may or may not have node correspondence. When node correspondence is present, we cluster networks by summarizing a network by its graphon estimate, whereas when node correspondence is not present, we propose a novel solution for clustering such networks by associating a computationally feasible feature vector to each network based on trace of powers of the adjacency matrix. We illustrate our methods using both simulated and real data sets, and theoretical justifications are provided in terms of consistency.




Barycentric subspace analysis of network-valued data

Maignant, Elodie, Pennec, Xavier, Trouvé, Alain, Calissano, Anna

arXiv.org Machine Learning

Certain data are naturally modeled by networks or weighted graphs, be they arterial networks or mobility networks. When there is no canonical labeling of the nodes across the dataset, we talk about unlabeled networks. In this paper, we focus on the question of dimensionality reduction for this type of data. More specifically, we address the issue of interpreting the feature subspace constructed by dimensionality reduction methods. Most existing methods for network-valued data are derived from principal component analysis (PCA) and therefore rely on subspaces generated by a set of vectors, which we identify as a major limitation in terms of interpretability. Instead, we propose to implement the method called barycentric subspace analysis (BSA), which relies on subspaces generated by a set of points. In order to provide a computationally feasible framework for BSA, we introduce a novel embedding for unlabeled networks where we replace their usual representation by equivalence classes of isomorphic networks with that by equivalence classes of cospectral networks. We then illustrate BSA on simulated and real-world datasets, and compare it to tangent PCA.


Graphon based Clustering and Testing of Networks: Algorithms and Theory

Sabanayagam, Mahalakshmi, Vankadara, Leena Chennuru, Ghoshdastidar, Debarghya

arXiv.org Machine Learning

Network-valued data are encountered in a wide range of applications, and pose challenges in learning due to their complex structure and absence of vertex correspondence. Typical examples of such problems include classification or grouping of protein structures and social networks. Various methods, ranging from graph kernels to graph neural networks, have been proposed that achieve some success in graph classification problems. However, most methods have limited theoretical justification, and their applicability beyond classification remains unexplored. In this work, we propose methods for clustering multiple graphs, without vertex correspondence, that are inspired by the recent literature on estimating graphons-- symmetric functions corresponding to infinite vertex limit of graphs. We propose a novel graph distance based on sorting-and-smoothing graphon estimators. Using the proposed graph distance, we present two clustering algorithms and show that they achieve state-of-the-art results. We prove the statistical consistency of both algorithms under Lipschitz assumptions on the graph degrees. We further study the applicability of the proposed distance for graph two-sample testing problems. Machine learning on graphs has evolved considerably over the past two decades. The traditional view towards network analysis is limited to modelling interactions among entities of interest, for instance social networks or world wide web, and learning algorithms based on graph theory have been commonly used to solve these problems (Von Luxburg, 2007; Yan et al., 2006).


A Multilayer Correlated Topic Model

Tian, Ye

arXiv.org Machine Learning

We proposed a novel multilayer correlated topic model (MCTM) to analyze how the main ideas inherit and vary between a document and its different segments, which helps understand an article's structure. The variational expectation-maximization (EM) algorithm was derived to estimate the posterior and parameters in MCTM. We introduced two potential applications of MCTM, including the paragraph-level document analysis and market basket data analysis. The effectiveness of MCTM in understanding the document structure has been verified by the great predictive performance on held-out documents and intuitive visualization. We also showed that MCTM could successfully capture customers' popular shopping patterns in the market basket analysis.


On clustering network-valued data

Mukherjee, Soumendu Sundar, Sarkar, Purnamrita, Lin, Lizhen

Neural Information Processing Systems

Community detection, which focuses on clustering nodes or detecting communities in (mostly) a single network, is a problem of considerable practical interest and has received a great deal of attention in the research community. While being able to cluster within a network is important, there are emerging needs to be able to \emph{cluster multiple networks}. This is largely motivated by the routine collection of network data that are generated from potentially different populations. These networks may or may not have node correspondence. When node correspondence is present, we cluster networks by summarizing a network by its graphon estimate, whereas when node correspondence is not present, we propose a novel solution for clustering such networks by associating a computationally feasible feature vector to each network based on trace of powers of the adjacency matrix.


On clustering network-valued data

Mukherjee, Soumendu Sundar, Sarkar, Purnamrita, Lin, Lizhen

Neural Information Processing Systems

Community detection, which focuses on clustering nodes or detecting communities in (mostly) a single network, is a problem of considerable practical interest and has received a great deal of attention in the research community. While being able to cluster within a network is important, there are emerging needs to be able to \emph{cluster multiple networks}. This is largely motivated by the routine collection of network data that are generated from potentially different populations. These networks may or may not have node correspondence. When node correspondence is present, we cluster networks by summarizing a network by its graphon estimate, whereas when node correspondence is not present, we propose a novel solution for clustering such networks by associating a computationally feasible feature vector to each network based on trace of powers of the adjacency matrix. We illustrate our methods using both simulated and real data sets, and theoretical justifications are provided in terms of consistency.